23 research outputs found
Geo-located Twitter as the proxy for global mobility patterns
In the advent of a pervasive presence of location sharing services
researchers gained an unprecedented access to the direct records of human
activity in space and time. This paper analyses geo-located Twitter messages in
order to uncover global patterns of human mobility. Based on a dataset of
almost a billion tweets recorded in 2012 we estimate volumes of international
travelers in respect to their country of residence. We examine mobility
profiles of different nations looking at the characteristics such as mobility
rate, radius of gyration, diversity of destinations and a balance of the
inflows and outflows. The temporal patterns disclose the universal seasons of
increased international mobility and the peculiar national nature of overseen
travels. Our analysis of the community structure of the Twitter mobility
network, obtained with the iterative network partitioning, reveals spatially
cohesive regions that follow the regional division of the world. Finally, we
validate our result with the global tourism statistics and mobility models
provided by other authors, and argue that Twitter is a viable source to
understand and quantify global mobility patterns.Comment: 17 pages, 13 figure
Scaling of city attractiveness for foreign visitors through big data of human economical and social media activity
Scientific studies investigating laws and regularities of human behavior are
nowadays increasingly relying on the wealth of widely available digital
information produced by human social activity. In this paper we leverage big
data created by three different aspects of human activity (i.e., bank card
transactions, geotagged photographs and tweets) in Spain for quantifying city
attractiveness for the foreign visitors. An important finding of this papers is
a strong superlinear scaling of city attractiveness with its population size.
The observed scaling exponent stays nearly the same for different ways of
defining cities and for different data sources, emphasizing the robustness of
our finding. Temporal variation of the scaling exponent is also considered in
order to reveal seasonal patterns in the attractivenessComment: 8 pages, 3 figures, 1 tabl
Mining Urban Performance: Scale-Independent Classification of Cities Based on Individual Economic Transactions
Intensive development of urban systems creates a number of challenges for
urban planners and policy makers in order to maintain sustainable growth.
Running efficient urban policies requires meaningful urban metrics, which could
quantify important urban characteristics including various aspects of an actual
human behavior. Since a city size is known to have a major, yet often
nonlinear, impact on the human activity, it also becomes important to develop
scale-free metrics that capture qualitative city properties, beyond the effects
of scale. Recent availability of extensive datasets created by human activity
involving digital technologies creates new opportunities in this area. In this
paper we propose a novel approach of city scoring and classification based on
quantitative scale-free metrics related to economic activity of city residents,
as well as domestic and foreign visitors. It is demonstrated on the example of
Spain, but the proposed methodology is of a general character. We employ a new
source of large-scale ubiquitous data, which consists of anonymized countrywide
records of bank card transactions collected by one of the largest Spanish
banks. Different aspects of the classification reveal important properties of
Spanish cities, which significantly complement the pattern that might be
discovered with the official socioeconomic statistics.Comment: 10 pages, 7 figures, to be published in the proceedings of ASE
BigDataScience 2014 conferenc
Collective Prediction of Individual Mobility Traces for Users with Short Data History
<div><p>We present and test a sequential learning algorithm for the prediction of human mobility that leverages large datasets of sequences to improve prediction accuracy, in particular for users with a short and non-repetitive data history such as tourists in a foreign country. The algorithm compensates for the difficulty of predicting the next location when there is limited evidence of past behavior by leveraging the availability of sequences of other users in the same system that provide redundant records of typical behavioral patterns. We test the method on a dataset of 10 million roaming mobile phone users in a European country. The average prediction accuracy is significantly higher than that of individual sequence prediction algorithms, primarily constant order Markov models derived from the user’s own data, that have been shown to achieve high accuracy in previous studies of human mobility. The proposed algorithm is generally applicable to improve any sequential prediction when there is a sufficiently rich and diverse dataset of sequences.</p></div
Correct/incorrect prediction for given position for three selected sequences.
<p>Prediction accuracy in our setup depends crucially on the availability of good experts in the ensemble. In the three example sequences we see a color-coded depiction of prediction success or failure adjacent to the numbers of awake and best experts, i.e. experts that can provide a prediction at a given step, and those among them which have accumulated the minimum loss up to that step. The three sequences are rather typical examples seen in the test dataset. Low numbers of best and awake experts almost invariably lead to incorrect predictions, and vice versa.</p
Comparison with the best expert in the ensemble.
<p>The best expert here is declared at the end of the sequence, as the Markov model in the expert ensemble which accumulated the minimum loss during prediction. If more than one experts share this property, a representative is chosen arbitrarily. (A) The EW forecaster’s prediction accuracy compared to the best expert prediction accuracy. The forecaster’s accuracy is superior more often than not, and with larger differences, resulting in a 4% average advantage. (B) The <i>O</i>(1) Markov model constructed sequentially from the user’s own locations as they are recorded in real time is less accurate than the best expert for a large majority of the test sequences. It may appear slightly surprising that another users data is better at predicting a given user’s location sequence, but the user’s own Markov model is constructed sequentially, needing time to learn the patterns, while experts’ Markov models enter the “competition” fully constructed.</p
Prediction per position and over a hour of a day.
<p>(A) Average prediction accuracy per position <i>n</i> in the sequence, for the EW forecaster and Markov models orders <i>k</i> = 1, 2, 3. The best Markov model is <i>O</i>(1) and is on par with the EW forecaster for the first half-day after the start of the user’s sequence and the prediction process. EW achieves a stable (average) lead after that point. The quasi-periodic pattern is due to the fact that most roamers arrive to the visit country during the day, combined with the fluctuation between day and night prediction accuracies seen in (B). Prediction accuracy is significantly higher in the period between 02:00–08:00 because of the much higher regularity of mobility patterns during these hours.</p
EW forecaster prediction accuracy.
<p>(A) Percentage of sequences predicted with a certain accuracy (in bins of 10%) for the EW forecaster and Markov models of order <i>k</i> = 1, 2, 3 constructed sequentially from the users own data as the sequence of locations is observed in time. We use a learning rate <i>η</i> = 3. The EW forecaster improves on the performance of the best Markov model, which again turns out to be <i>O</i>(1) [<a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0170907#pone.0170907.ref027" target="_blank">27</a>, <a href="http://www.plosone.org/article/info:doi/10.1371/journal.pone.0170907#pone.0170907.ref032" target="_blank">32</a>], by an average of 5%. A detailed comparison between the two is depicted in (B), the scatterplot of difference in prediction accuracy per sequence. For more than 90% of the test sequences, the EW forecaster is more accurate.</p
Prediction accuracy dependence on sampling and <i>T</i><sub><i>past</i></sub>.
<p>(A) Average prediction accuracy for particular filterings of the expert ensemble. We randomly sample experts from the ensemble and additionally we filter the experts’ sequence fragments so that only those that end within a time window <i>T</i><sub><i>past</i></sub> are included. Decreasing the sampling rate and/or reducing <i>T</i><sub><i>past</i></sub> decimates the ensemble, and beyond a point it hits the accuracy of the forecaster. (B) The average percentage of distinct transitions <i>X</i><sub><i>n</i>−1</sub> → <i>X</i><sub><i>n</i></sub> in a test sequence that are contained by at least one expert in the ensemble after filtering. Prediction accuracy in (A) starts dropping when the sampling rate is reduced beyond a few percent, showing that the ensemble is very diverse and robust. A very slight drop in performance comes with including all experts, due to the logarithmic search costs of the forecaster when the ensemble grows.</p